knitr::opts_chunk$set(echo = TRUE)

Tasks

Task 1

Obtain the working directory.

getwd()

Task 2

Read in the SPRUCE.csv data.

spruce.df <- read.csv("SPRUCE.csv")
head(spruce.df)

Task 3

Plot and interpret the spruce data.

with(spruce.df,  {
plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2, main="Spruce Breast Height Diameter vs Height")
 }
)

The data appears to be mostly linear, although it is not perfect.

library(s20x)
trendscatter(Height~BHDiameter, f=0.5, data=spruce.df)
trendscatter(Height~BHDiameter, f=0.6, data=spruce.df)
trendscatter(Height~BHDiameter, f=0.7, data=spruce.df)
spruce.lm=with(spruce.df, lm(Height~BHDiameter))
with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2, main="Spruce Breast Height Diameter vs Height")})
abline(spruce.lm)

The line is not an accurate representation of the data. The trendscatter curve seems more accurate, although more information for lower values would make it easier to tell.

Task 4

Determine sums of squares for the spruce data.

layout(matrix(1:4,nr=2,nc=2,byrow=TRUE))

with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)})
abline(spruce.lm)

with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)})
with(spruce.df, {segments(BHDiameter,Height,BHDiameter,fitted(spruce.lm),col="Blue")})
abline(spruce.lm)

with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)})
with(spruce.df, {segments(BHDiameter,mean(Height),BHDiameter,fitted(spruce.lm),col="Red")})
abline(spruce.lm)
abline(h=mean(spruce.df$Height))

with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)})
with(spruce.df, {segments(BHDiameter,Height,BHDiameter,mean(Height),col="Green")})
abline(h=mean(spruce.df$Height))

RSS=with(spruce.df, sum((Height-fitted(spruce.lm))^2))
RSS

MSS=with(spruce.df, sum((mean(Height)-fitted(spruce.lm))^2))
MSS

TSS=with(spruce.df, sum((Height-mean(Height))^2))
TSS

MSS/TSS

MSS/TSS represents the total proportion of the variance that the model accounts for.

TSS
MSS+RSS

TSS is equal to MSS+RSS.

Task 5

Describe the regression line.

summary(spruce.lm)

The slope of the line is 0.48147.

The y-intercept is 9.14684.

The line's equation is Height = 0.48147*BHDiameter + 9.14684.

predict(spruce.lm, data.frame(BHDiameter=c(15,18,20)))

Task 6

Create a detailed plot of the data.

library(ggplot2)
g=ggplot(spruce.df, aes(x=BHDiameter,y=Height,colour=BHDiameter))
g=g+geom_point() + geom_line()+ geom_smooth(method="lm")
g+ggtitle("Height vs BHDiameter")

Task 7

Create a shiny interactive document with a plot of the spruce data.

{ width=70% } { width=70% } { width=70% }



draket333/MATH4753tayl0062 documentation built on Sept. 10, 2020, 11:49 a.m.